De novo screening of disease-resistant genes from the chromosome-level genome of rare minnow using CRISPR-cas9 random mutation

Abstract Background Mutants are important for the discovery of functional genes and creation of germplasm resources. Mutant acquisition depends on the efficiency of mutation technology and screening methods. CRISPR-Cas9 technology is an efficient gene editing technology mainly used for editing a few genes or target sites, which has not been applied for the construction of random mutant libraries and for the de novo discovery of functional genes. Results In this study, we first sequenced and assembled the chromosome-level genome of wild-type rare minnow (Gobiocypris rarus) as a susceptible model of hemorrhagic disease, obtained a 956.05 Mb genome sequence, assembled the sequence into 25 chromosomes, and annotated 26,861 protein-coding genes. Thereafter, CRISPR-Cas9 technology was applied to randomly mutate the whole genome of rare minnow with the conserved bases (TATAWAW and ATG) of the promoter and coding regions as the target sites. The survival rate of hemorrhagic disease in the rare minnow gradually increased from 0% (the entire wild-type population died after infection) to 38.24% (F3 generation). Finally, 7 susceptible genes were identified via genome comparative analysis and cell-level verification based on the rare minnow genome. Conclusions The results provided the genomic resources for wild-type rare minnow, and confirmed that the random mutation system designed using CRISPR-Cas9 technology in this study is simple and efficient and is suitable for the de novo discovery of functional genes and creation of a germplasm resource related to qualitative traits.

Background: Mutants are important for the discovery of functional genes and creation of germplasm resources. Mutant acquisition depends on the efficiency of mutation technology and screening methods. CRISPR-Cas9 technology is an efficient gene editing technology mainly used for editing a few genes or target sites, which has not been applied for the construction of random mutant libraries and for the de novo discovery of functional genes. Results: In this study, we first sequenced and assembled the chromosome-level genome of wild-type rare minnow as a susceptible model of hemorrhagic disease, obtained a 956.05 M genome sequence, assembled the sequence into 25 chromosomes, and annotated 26,861 protein-coding genes. Thereafter, CRISPR-Cas9 technology was applied to randomly mutate the whole genome of rare minnow with the conserved bases (TATAWAW and ATG) of the promoter and coding regions as the target sites. The survival rate of hemorrhagic disease in the rare minnow gradually increased from 0% (the entire wild-type population died after infection) to 38.24% (F3 generation). Finally, seven susceptible genes were identified via genome comparative analysis and cell-level verification based on the rare minnow genome. Conclusions: The results provided the genomic resources for wild-type rare minnow, and confirmed that the random mutation system designed using CRISPR-Cas9 technology in this study is simple and efficient, and is suitable for the de novo discovery of functional genes and creation of a germplasm related to quality traits .

Introduction
Rare minnow (Gobiocypris rarus) belongs to the order Cypriniformes and family Cyprinidae, and it has the advantages of a small body, fast reproduction, and easy feeding. It is more sensitive to some pollutants compared to zebrafish (Danio rerio) and medaka (Oryzias latipes). For example, the sensitivity of rare minnow to 17α-ethinylestradiol and pentachlorophenol is higher than that of zebrafish, and its sensitivity to ethinylestradiol is higher than that of medaka [1][2][3]. Therefore, it has been widely employed in genetics, physiology, biological monitoring, toxicity testing, and other fields [4].
The mortality rate of rare minnow infected with grass carp (Ctenopharyngodon idellus) reovirus (GCRV) is 100% [5]. Grass carp, which also belongs to the family Cyprinidae, is one of the most important freshwater fishes worldwide. The mortality of grass carp hemorrhagic disease caused by GCRV infection is more than 80% [6], which poses a great threat to the development of the aquaculture industry. Rare minnow, similar to grass carp, is highly sensitive to GCRV, which makes it an ideal model for studying grass carp hemorrhagic disease and exploring germplasm resources.
Research on efficient mutation methods is a prerequisite for constructing an ideal animal model.
Traditional physical and chemical mutagenesis methods mainly cause genomic point mutations [7][8][9], which cannot be distinguished from natural SNP mutations, leading to considerable difficulties when performing comparative analyses of the subsequent functional genomes. Traditional transposon mutations have a strong selectivity for the mutation region of the receptor genome, and they are unable to achieve random mutations for all genes [10,11]. Efficient and easy-to-detect mutation methods are important for obtaining mutants and for exploring new germplasm resources.
CRISPR-Cas9 technology is an efficient gene editing technology mainly used for editing a few genes or target sites [12,13]. It is also used to study mutant libraries. In previous research on human cells and rice, the main way to construct a mutant library was to design sgRNA of all candidate genes, then mix all sgRNAs, and select target mutants after knockout [14][15][16][17]. In this way, a large number of sgRNAs needed be designed at a high cost, and it is only suitable for the construction of a mutant library with known candidate genes. To date, efficient CRISPR-Cas9 technology has not been applied for the construction of random mutant libraries and for the de novo discovery of functional genes.
All wild-type rare minnows die after being infected with GCRV, which provides us with an excellent mutant screening material for GCRV resistance. That is, the individual who can survive after infection is likely to be an individual with successful mutations. In this study, we assembled a high-quality genome of rare minnow, and then used CRISPR-Cas9 technology to randomly mutate the complete genome of the rare minnow and obtained a mutant population with GCRV resistance traits. Next, we obtained seven hemorrhagic disease-susceptible genes via genome comparative analysis and experimental verification. The results not only provided genomic resources for research on rare minnow, but also facilitated the establishment of a simple and feasible method for random genomic mutations, which are suitable for the exploration of functional genes and new germplasm resources.

Genome assembly and annotation
To initially evaluate the genome of rare minnow, we obtained 124.20 G raw data and 121.11 G clean data after routine filtering. Based on the K-mer (K = 21) analysis method, the genome size was estimated to be 943.44 M, the heterozygosity rate was 0.41%, and the repetition rate was 35.82%. The results showed that the genome of rare minnow is a simple genome rather than a complex one. the Hi-C assembly (Fig. 1a). The interaction signals obtained from the heat map could help to clearly distinguish the 25 chromosomes, indicating that the assembly effect of the genome was very good.
We annotated 43.14% of the rare minnow genome as repetitive sequences (Additional File 1: Table).
In addition, 36,387 mRNAs were annotated, corresponding to 26,861 genes, of which 4,957 genes had alternative splicing transcripts. The average length of the longest CDS of all genes was 1.82 K, which was close to the average length of zebrafish and higher than that of grass carp and blunt snout bream (Megalobrama amblycephala) (Additional File 2: Table).

Evolutionary analysis of the genomes
Through cluster analysis of gene families of 13 species, 20,723 gene families were obtained, among which 2,376 were shared gene families, 17,364 were shared genes, and 144 were single-copy gene families. A phylogenetic tree was constructed using all single-copy gene families (Fig. 1b). Fig. 1b shows that four Cyprinidae species were clustered into one branch; the differentiation time of rare minnow and grass carp was 67.04 MYA (Fig. 1b).
Collinearity analysis showed that 18,968 similar genes were located in 97 superconsigs of grass carp.
The linkage groups of the 97 supercontigs of grass carp were mapped to the genome of the rare minnow ( Fig. 1c). Chromosomes 1 and 21 (LG1 and LG21) of rare minnow correspond to LG13 of grass carp, and the degree of gene collinearity of the two species was very high (Fig. 1c).

Anti-hemorrhagic model of rare minnow
Twenty-six sgRNAs were mixed with Cas9 protein and injected into approximately 8,000 single-cell embryos. Finally, 3,126 two-month-old P0 mutants were obtained. Among them, 3,000 were used in the GCRV infection experiment. The results showed that 2,993 died and seven survived, with a survival rate of 0.23%, conversely, all the 351 individuals of the control group (wild-type) died, demonstrating a survival rate of 0% (Fig. 2a). During the course of the disease, the dead individuals in the mutation and control groups exhibited a red body surface, showing obvious hemorrhagic symptoms (Fig. 2b).
To eliminate the difference in survival rate caused by experimental errors, two F1 families (F1-1 and F1-2) were obtained by lateral-crossing two surviving males from the P0 generation with wild-type females.
The survival rates of F1-1 and F1-2 were 1.25% and 1.89%, respectively, and F2- while those in the F1-1 family began to die at 7 dpi. In four GCRV-infected F2 families, the mortality of the four families was higher than that of the control group at 6 dpi, but was lower after 8 dpi. In addition, the duration of death of the F2 families was prolonged by 3-4 days compared with that in the control group ( Fig. 2e). Among the seven F3 generation families, two families (F3-1 and F3-2) died faster than the control at 5 and 6 dpi, but the death rate of all mutant families was lower than that of the control after 7 dpi.
The duration of deaths in the F3 mutant families was prolonged by 3-10 days compared to that in the control group (Fig. 2f). Overall, compared to the control, F1, F2, and F3 mutant families exhibited delayed death induced by GCRV infection.

Screening of candidate indel loci related to hemorrhagic disease
The indel loci and genotypes of 11 datasets (C, S1, L7, T1, T2, T3, and P1-P5) were analyzed using GATK v4.1.1.0. The genotypes of the indels in the T1, T2, and T3 groups were compared with those in the same position in the control groups (C, S1 and L7), and represented by the letters T, N, and F (Fig. 3). These genotypes were then divided into four grades: high, moderate, low, and modified based on the contribution of these loci to gene function changes. There were 147,679 (139,632 + 1,377 + 1,971 + 1,622 + 931 + 1,125 + 1,021) indels in TTT, two T + one N, and one T + two N types (outer ring of  Table). These 23 loci were associated with hemorrhagic diseases. The TATAWAW and ATG targets closest to these 23 loci were analyzed. The targets of 22 loci were found to be ATG, and the targets of one site could not be determined because both TATAWAW and ATG targets were nearby (within 20 bp) (Additional File 4: Table).

Functional verification of susceptible genes related to hemorrhagic disease
According to the genome annotation information of rare minnow, 20 genes containing 23 loci related to hemorrhagic disease were identified (Additional File 5: Table). By comparing 20 genes of rare minnow with annotation information of the grass carp genome, 23 homologous genes in grass carp were obtained (Additional File 6: Table). siRNAs and specific primers for 23 grass carp genes were designed, and the sequences are shown in Additional File 7. After the siRNAs were transfected into GCO cells, the relative expression level of each target gene at 48 h post-transfection in the siRNA-transfected cells was normalized to the expression level of the target gene at 0 h. The results indicated that nine siRNAs had a significant inhibitory effect (p < 0.05) (Fig. 4a). To study the effects of siRNA knockdown on GCRV infection, these nine siRNAs were transfected into GCO cells and infected with GCRV. RT-qPCR analysis showed that transfection of seven siRNAs significantly reduced the copy number of GCRV in GCO cells at 32 h post-transfection, compared with that in the NC group (p < 0.05) (Fig. 4b). Further, the titer of GCRV contained in GCO cells transfected with the seven siRNAs was detected. It was showed that the titer decreased significantly in the GCO cell groups treated with the seven siRNAs (p < 0.05) (Fig. 4c). These results suggest that these seven genes are indeed susceptible to GCRV.

Discussion
In this study, the genome sequence and annotation information of rare minnow were obtained, providing a high-quality genome analysis platform for research and use in more fields.
In addition, we established a method for constructing a genome-wide random mutant library via the special application of CRISPR-Cas9 using rare minnow as a hemorrhagic disease-susceptible model (Additional File 8: Figure). This method has a wide mutation range, low cost, and high efficiency and is suitable for functional genomics research and for creation of germplasm resources related to quality traits.
To date, some studies have used CRISPR-Cas9 technology to construct a mutant library of human cells and rice [14][15][16][17]. They designed sgRNA within the range of existing candidate genes. The advantage of this strategy is that it is helpful in detecting mutation sites; however, the disadvantage is that it requires sufficient candidate gene sequences. If there were no expected trait-related genes among the candidate genes, the expected mutant could not be obtained. The target sites in this study were designed based on the conserved bases of the gene promoter and coding region (TATAWAW and ATG) (Fig. 5), which can theoretically cover the functional region of all genes in the genome, thus increasing the abundance of mutation libraries and greatly improving the possibility of obtaining target trait mutants. In addition, the method established in this study only requires the synthesis of 26 sgRNAs, and mutants can be obtained using efficient screening methods. However, it should be noted that 23 hemorrhagic disease-associated loci were almost all produced by ATG targets (Additional File 4: Table). This may be related to the fact that only a few gene promoters contain TATA boxes. Previous studies have found that 23.85% of eukaryotic promoter sequences contain TATA boxes, and approximately 20% of yeast genes contain a TATA box [18,19]. Future studies are expected to consider only ATG as a mutation target.
Many studies have been conducted to construct plant mutant libraries by physical and chemical mutagenesis, with mutant frequency between 0.031% and 9.3% [20][21][22][23]. The efficiency of Arabidopsis thaliana mutants obtained through transposon mutagenesis was 0.091% and 1% [24,25]. In animals, the chemical mutagen ethyl nitrosouria (ENU) is mainly used in relevant studies in some species, such as Caenorhabditis elegans [26,27], zebrafish [28,29], mouse (Mus musculus) [30,31], grass carp [32] and pig (Sus scrofa) [33], and the mutant frequency is generally not more than 0.03%. Compared with existing studies, the mutant frequency of this method (0.23%) is similar to that in plants, but approximately 10 times higher than that in animals. In addition, another important reason why we successfully obtained resistant mutants by this method is that the selected traits were quality traits. The entire wild-type population of rare minnow died after hemorrhagic disease, individuals who survived after infection were mutated individuals, who could be easily and efficiently identified. However, it must be pointed out that this method requires whole genome sequence information and may not be feasible in the screening of quantitative traits or some species with long generation times.

Sources of experimental fish, viruses, and cells
Rare The reference number obtained was Y9110306.

Genome sequencing and assembly
A sexually mature female rare minnow was selected for this study. Part of the muscle tissue was frozen in liquid nitrogen and genomic DNA was extracted from the other part. The cetyltrimethylammonium bromide method was used to extract DNA. Next generation sequencing (NGS) was performed on an Illumina HiSeq X Ten platform using paired-end reads (PE) of 150 bp, and the sequencing fragments were 350 ± 50 bp. After conventional filtering, a K-mer frequency distribution map was drawn based on the K-mer (K = 21) analysis method and genome size, heterozygosity, and repetition rate were evaluated. The PacBio Sequel system was used for third-generation sequencing (TGS). Subreads were obtained using signal-to-noise ratio (SNR) filtering. After using Canu v1.9 (Canu, RRID:SCR_015880) [34] to self-correct subreads, WTDBG v1.2.8 (WTDBG, RRID:SCR_017225) [35] was used for sequence assembly. Based on previous NGS data used for genome evaluation, the assembled genome sequence was corrected using Pilon v1.23 (Pilon, RRID:SCR_014731) [36].
The muscle tissue cryopreserved in liquid nitrogen was fixed and crosslinked with formaldehyde, and a Hi-C library was constructed. NGS was performed using an Illumina HiSeq X Ten platform. Clean data were obtained after routine filtration and compared with assembled genome sequences. The comparison results were filtered using HIC-Pro v2.11.1 (HIC-Pro, RRID:SCR_017643) [37] to obtain valid interaction pairs. Based on valid interaction pairs, the genome assembled in the previous step was divided, sorted, and oriented using LACHESIS (LACHESIS, RRID:SCR_017644) [38], and the assembly sequence at the chromosome level was obtained. The completeness of the genome was evaluated through BUSCO v5.2.2 (BUSCO, RRID:SCR_015008) [39] using the gene set of actinopterygii_odb10. Then the number of Hi-C read pairs covering any two bins was used as the intensity signal of the interaction between the two bins, and a heat map was drawn to evaluate the Hi-C assembly results.

Genome annotation
Genome annotation was performed in two parts: repetitive sequence annotation and coding gene annotation.

Establishment of an anti-hemorrhagic disease model
The Twenty-six sgRNAs were mixed with Cas9 protein (Invitrogen, USA) at final concentrations of 400 ng/μL and 100 ng/μL. Each sgRNA was injected into approximately 300 rare minnow embryos, which constituted the P0 generation. At 2 months of age, a high-salt invasion method was used for GCRV infection. The method was as follows: the fish were soaked in 6% NaCl solution for 2 min and then quickly transferred to GCRV suspension (virus titer: 2.75 × 10 8 TCID50/mL) for 30 min. The wild-type mixed population used as a control group was infected in the same manner. The number of dead fish in each group was recorded daily.
From the surviving individuals of the P0 generation, male individuals were selected and lateral-crossed with wild-type female individuals to obtain F1 full-sib families. GCRV infection was performed at 2 months of age. The surviving individuals in an F1 full-sib family with the highest survival rate were self-crossed to construct F2 full sib families. The F3 generation was obtained by self-crossing in the same way and was infected with GCRV. The wild-type mixed population was used as a control group for infection. The number of deaths in the F1-F3 population and the wild-type population were counted every day after infection, those individuals who did not die for two consecutive weeks were termed survival individuals. Cumulative mortality curves were drawn, and the survival rate of each family was calculated.

Screening of candidate indels associated with hemorrhagic disease
Three surviving individuals were randomly selected from three families (F2-2, F2-3, and F2-4) with high disease-resistance in the F2 generation. Three wild-type female and three wild-type male individuals were selected. Genomic DNA was extracted from 15 fish using the high-salt method. Sequencing libraries T1, T2, and T3 were constructed by mixing the DNA of three fish in F2-2, F2-3, and F2-4, and sequencing library C was constructed by mixing the DNA of six wild-type individuals. The inserted fragment size was 350 ± 50 bp, and NGS was performed on the BGI MGISEQ-2000 platform with a PE 150. Five parents (F1 survival mutant P1-P5) of the F2 families were sequenced in the same manner. In addition, the NGS data (S1 and L7) of the two groups of wild-type were collected from our lab to increase the information richness of the control group. S1 was obtained from a wild-type female and a wild-type male mixed sample, and L7 was from a wild-type male sample. The genotypes of each indel locus in three samples (C, S1, and L7) were combined as controls and compared with corresponding indels in eight samples (T1, T2, T3, and P1-P5). Among the eight samples, the locus with the new genotype was recorded as "T", the locus without a genotyping result was recorded as "N", and the locus with a genotyping result but without a new genotype was recorded as "F". The contribution of these loci to gene function changes was used to distinguish the SnpEff annotation results, which can be divided into four levels: high, moderate, low, and modified (https://pcingola.github.io/SnpEff/se_inputoutput/#impact-prediction). Next, there were three steps in the screening process: the first step was to screen the loci that were not "F" type in T1, T2 and T3; the second step was to screen the loci that were not "F" type in the five parents from the results of the first step; the third step was to screen the loci with "high" contribution. Finally, the candidate loci associated with hemorrhagic disease were identified.

Functional verification of susceptible genes related to hemorrhagic disease
The genome annotation information of candidate loci of rare minnow was used to obtain the genes corresponding to these sites. Then, the cDNA sequences of these genes were compared with the annotated information of the grass carp genome [62], and homologous genes in grass carp were selected. In order to preliminarily and quickly verify the function of these candidate loci, we carried out relevant studies at the cell level of grass carp using knockdown technology. For each homologous grass carp gene, siRNA was designed and synthesized by RiboBio Co. Guangzhou. qPCR primers for homologous grass carp genes were designed to confirm the knockdown effect of the siRNA. and RT-qPCR was performed to detect the relative changes of GCRV RNA relative to the negative control.
GCO cells transfected and infected in the same way were removed to −70 °C, and frozen and thawed two times for collecting viral samples. Then, CIK cells were seeded into 96-well plates, 5000 cells per well.
After 24 h, the cells per well were infected with 100 μl viral samples of 10-fold serial dilutions in culture medium and incubated for 3 days. CPE was then observed under the microscope, and the titer was determined using the Reed-Muench formula [63] and expressed as TCID50/ml.

Data Availability
Raw sequences for genome assembly including Illumina, PacBio and       actin as the internal reference gene, the relative expression of GCRV RNA relative to NC was detected by the 2 −∆∆Ct method. c. CIK cells were seeded into 96-well plates. Then, the cells were infected with viral samples (from GCO cells transfected and infected as in b) for 3 days. CPE was then observed under the microscope, and the titer was determined using the Reed-Muench formula. Data represent results of three independent experiments, and error bars indicate mean ± SD. Statistical analyses were performed using multiple t-tests (n = 3), and asterisk indicates P < 0.05.